{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Abstractive"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "\n",
    "This tutorial is available as an IPython notebook at [Malaya/example/keyword-abstractive](https://github.com/huseinzol05/Malaya/tree/master/example/keyword-abstractive).\n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "\n",
    "This module only trained on standard language structure, so it is not save to use it for local language structure.\n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "\n",
    "This module trained heavily on news and articles structure.\n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "\n",
    "Results generated using stochastic methods.\n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ['CUDA_VISIBLE_DEVICES'] = ''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/husein/.local/lib/python3.8/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.\n",
      "  warn(\"The installed version of bitsandbytes was compiled without GPU support. \"\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/home/husein/.local/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32\n",
      "CPU times: user 2.9 s, sys: 3.81 s, total: 6.71 s\n",
      "Wall time: 1.94 s\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/husein/dev/malaya/malaya/tokenizer.py:214: FutureWarning: Possible nested set at position 3397\n",
      "  self.tok = re.compile(r'({})'.format('|'.join(pipeline)))\n",
      "/home/husein/dev/malaya/malaya/tokenizer.py:214: FutureWarning: Possible nested set at position 3927\n",
      "  self.tok = re.compile(r'({})'.format('|'.join(pipeline)))\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "import malaya\n",
    "from pprint import pprint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import re\n",
    "\n",
    "# minimum cleaning, just simply to remove newlines.\n",
    "def cleaning(string):\n",
    "    string = string.replace('\\n', ' ')\n",
    "    string = re.sub(r'[ ]+', ' ', string).strip()\n",
    "    return string"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I am going to simply copy paste some local news into this notebook. I will search about `isu mahathir` in google news, [link here](https://www.google.com/search?q=isu+mahathir&sxsrf=ALeKk02V_bAJC3sSrV38JQgGYWL_mE0biw:1589951900053&source=lnms&tbm=nws&sa=X&ved=2ahUKEwjapNmx2MHpAhVp_XMBHRt7BEQQ_AUoAnoECCcQBA&biw=1440&bih=648&dpr=2)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**link**: https://www.hmetro.com.my/mutakhir/2020/05/580438/peletakan-jawatan-tun-m-ditolak-bukan-lagi-isu\n",
    "\n",
    "**Title**: Peletakan jawatan Tun M ditolak, bukan lagi isu.\n",
    "\n",
    "**Body**: PELETAKAN jawatan Tun Dr Mahathir Mohamad sebagai Pengerusi Parti Pribumi Bersatu Malaysia (Bersatu) ditolak di dalam mesyuarat khas Majlis Pimpinan Tertinggi (MPT) pada 24 Februari lalu.\n",
    "\n",
    "Justeru, tidak timbul soal peletakan jawatan itu sah atau tidak kerana ia sudah pun diputuskan pada peringkat parti yang dipersetujui semua termasuk Presiden, Tan Sri Muhyiddin Yassin.\n",
    "\n",
    "Bekas Setiausaha Agung Bersatu Datuk Marzuki Yahya berkata, pada mesyuarat itu MPT sebulat suara menolak peletakan jawatan Dr Mahathir.\n",
    "\n",
    "\"Jadi ini agak berlawanan dengan keputusan yang kita sudah buat. Saya tak faham bagaimana Jabatan Pendaftar Pertubuhan Malaysia (JPPM) kata peletakan jawatan itu sah sedangkan kita sudah buat keputusan di dalam mesyuarat, bukan seorang dua yang buat keputusan.\n",
    "\n",
    "\"Semua keputusan mesti dibuat melalui parti. Walau apa juga perbincangan dibuat di luar daripada keputusan mesyuarat, ini bukan keputusan parti.\n",
    "\n",
    "\"Apa locus standy yang ada pada Setiausaha Kerja untuk membawa perkara ini kepada JPPM. Seharusnya ia dibawa kepada Setiausaha Agung sebagai pentadbir kepada parti,\" katanya kepada Harian Metro.\n",
    "\n",
    "Beliau mengulas laporan media tempatan hari ini mengenai pengesahan JPPM bahawa Dr Mahathir tidak lagi menjadi Pengerusi Bersatu berikutan peletakan jawatannya di tengah-tengah pergolakan politik pada akhir Februari adalah sah.\n",
    "\n",
    "Laporan itu juga menyatakan, kedudukan Muhyiddin Yassin memangku jawatan itu juga sah.\n",
    "\n",
    "Menurutnya, memang betul Dr Mahathir menghantar surat peletakan jawatan, tetapi ditolak oleh MPT.\n",
    "\n",
    "\"Fasal yang disebut itu terpakai sekiranya berhenti atau diberhentikan, tetapi ini mesyuarat sudah menolak,\" katanya.\n",
    "\n",
    "Marzuki turut mempersoal kenyataan media yang dibuat beberapa pimpinan parti itu hari ini yang menyatakan sokongan kepada Perikatan Nasional.\n",
    "\n",
    "\"Kenyataan media bukanlah keputusan rasmi. Walaupun kita buat 1,000 kenyataan sekali pun ia tetap tidak merubah keputusan yang sudah dibuat di dalam mesyuarat. Kita catat di dalam minit apa yang berlaku di dalam mesyuarat,\" katanya."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "string = \"\"\"\n",
    "PELETAKAN jawatan Tun Dr Mahathir Mohamad sebagai Pengerusi Parti Pribumi Bersatu Malaysia (Bersatu) ditolak di dalam mesyuarat khas Majlis Pimpinan Tertinggi (MPT) pada 24 Februari lalu.\n",
    "\n",
    "Justeru, tidak timbul soal peletakan jawatan itu sah atau tidak kerana ia sudah pun diputuskan pada peringkat parti yang dipersetujui semua termasuk Presiden, Tan Sri Muhyiddin Yassin.\n",
    "\n",
    "Bekas Setiausaha Agung Bersatu Datuk Marzuki Yahya berkata, pada mesyuarat itu MPT sebulat suara menolak peletakan jawatan Dr Mahathir.\n",
    "\n",
    "\"Jadi ini agak berlawanan dengan keputusan yang kita sudah buat. Saya tak faham bagaimana Jabatan Pendaftar Pertubuhan Malaysia (JPPM) kata peletakan jawatan itu sah sedangkan kita sudah buat keputusan di dalam mesyuarat, bukan seorang dua yang buat keputusan.\n",
    "\n",
    "\"Semua keputusan mesti dibuat melalui parti. Walau apa juga perbincangan dibuat di luar daripada keputusan mesyuarat, ini bukan keputusan parti.\n",
    "\n",
    "\"Apa locus standy yang ada pada Setiausaha Kerja untuk membawa perkara ini kepada JPPM. Seharusnya ia dibawa kepada Setiausaha Agung sebagai pentadbir kepada parti,\" katanya kepada Harian Metro.\n",
    "\n",
    "Beliau mengulas laporan media tempatan hari ini mengenai pengesahan JPPM bahawa Dr Mahathir tidak lagi menjadi Pengerusi Bersatu berikutan peletakan jawatannya di tengah-tengah pergolakan politik pada akhir Februari adalah sah.\n",
    "\n",
    "Laporan itu juga menyatakan, kedudukan Muhyiddin Yassin memangku jawatan itu juga sah.\n",
    "\n",
    "Menurutnya, memang betul Dr Mahathir menghantar surat peletakan jawatan, tetapi ditolak oleh MPT.\n",
    "\n",
    "\"Fasal yang disebut itu terpakai sekiranya berhenti atau diberhentikan, tetapi ini mesyuarat sudah menolak,\" katanya.\n",
    "\n",
    "Marzuki turut mempersoal kenyataan media yang dibuat beberapa pimpinan parti itu hari ini yang menyatakan sokongan kepada Perikatan Nasional.\n",
    "\n",
    "\"Kenyataan media bukanlah keputusan rasmi. Walaupun kita buat 1,000 kenyataan sekali pun ia tetap tidak merubah keputusan yang sudah dibuat di dalam mesyuarat. Kita catat di dalam minit apa yang berlaku di dalam mesyuarat,\" katanya.\n",
    "\"\"\"\n",
    "\n",
    "string = cleaning(string)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Link**: https://www.malaysiakini.com/news/525953\n",
    "\n",
    "**Title**: Mahathir jangan hipokrit isu kes mahkamah Riza, kata Takiyuddin\n",
    "\n",
    "**Body**: Menteri undang-undang Takiyuddin Hassan berkata kerajaan berharap Dr Mahathir Mohamad tidak bersikap hipokrit dengan mengatakan beliau tertanya-tanya dan tidak faham dengan keputusan mahkamah melepas tanpa membebaskan (DNAA) Riza Aziz, anak tiri bekas perdana menteri Najib Razak, dalam kes pengubahan wang haram membabitkan dana 1MDB.\n",
    "\n",
    "Pemimpin PAS itu berkata ini kerana keputusan itu dibuat oleh peguam negara dan dilaksanakan oleh timbalan pendakwa raya yang mengendalikan kes tersebut pada akhir 2019.\n",
    "\n",
    "“Saya merujuk kepada kenyataan Dr Mahathir tentang tindakan Mahkamah Sesyen memberikan pelepasan tanpa pembebasan (discharge not amounting to acquittal) kepada Riza Aziz baru-baru ini.\n",
    "\n",
    "“Kerajaan berharap Dr Mahathir tidak bersikap hipokrit dengan mengatakan beliau ‘tertanya-tanya’, keliru dan tidak faham terhadap suatu keputusan yang dibuat oleh Peguam Negara dan dilaksanakan oleh Timbalan Pendakwa Raya yang mengendalikan kes ini pada akhir tahun 2019,” katanya dalam satu kenyataan hari ini.\n",
    "\n",
    "Riza pada Khamis dilepas tanpa dibebaskan daripada lima pertuduhan pengubahan wang berjumlah AS$248 juta (RM1.08 bilion).\n",
    "\n",
    "Dalam persetujuan yang dicapai antara pihak Riza dan pendakwaan, beliau dilepas tanpa dibebaskan atas pertuduhan itu dengan syarat memulangkan semula aset dari luar negara dengan nilai anggaran AS$107.3 juta (RM465.3 juta).\n",
    "\n",
    "Ekoran itu, Mahathir antara lain menyuarakan kekhuatirannya berkenaan persetujuan itu dan mempersoalkan jika pihak yang didakwa atas tuduhan mencuri boleh terlepas daripada tindakan jika memulangkan semula apa yang dicurinya.\n",
    "\n",
    "\"Dia curi berbilion-bilion...Dia bagi balik kepada kerajaan. Dia kata kepada kerajaan, 'Nah, duit yang aku curi. Sekarang ini, jangan ambil tindakan terhadap aku.' Kita pun kata, 'Sudah bagi balik duit okey lah',\" katanya.\n",
    "\n",
    "Menjelaskan bahawa beliau tidak mempersoalkan keputusan mahkamah, Mahathir pada masa sama berkata ia menunjukkan undang-undang mungkin perlu dipinda.\n",
    "\n",
    "Mengulas lanjut, Takiyuddin yang juga setiausaha agung PAS berkata\n",
    "kenyataan Mahathir tidak munasabah sebagai bekas perdana menteri.\n",
    "\n",
    "\"Kerajaan berharap Dr Mahathir tidak terus bertindak mengelirukan rakyat dengan mengatakan beliau ‘keliru’.\n",
    "\n",
    "“Kerajaan PN akan terus bertindak mengikut undang-undang dan berpegang kepada prinsip kebebasan badan kehakiman dan proses perundangan yang sah,” katanya."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "string2 = \"\"\"\n",
    "Menteri undang-undang Takiyuddin Hassan berkata kerajaan berharap Dr Mahathir Mohamad tidak bersikap hipokrit dengan mengatakan beliau tertanya-tanya dan tidak faham dengan keputusan mahkamah melepas tanpa membebaskan (DNAA) Riza Aziz, anak tiri bekas perdana menteri Najib Razak, dalam kes pengubahan wang haram membabitkan dana 1MDB.\n",
    "\n",
    "Pemimpin PAS itu berkata ini kerana keputusan itu dibuat oleh peguam negara dan dilaksanakan oleh timbalan pendakwa raya yang mengendalikan kes tersebut pada akhir 2019.\n",
    "\n",
    "“Saya merujuk kepada kenyataan Dr Mahathir tentang tindakan Mahkamah Sesyen memberikan pelepasan tanpa pembebasan (discharge not amounting to acquittal) kepada Riza Aziz baru-baru ini.\n",
    "\n",
    "“Kerajaan berharap Dr Mahathir tidak bersikap hipokrit dengan mengatakan beliau ‘tertanya-tanya’, keliru dan tidak faham terhadap suatu keputusan yang dibuat oleh Peguam Negara dan dilaksanakan oleh Timbalan Pendakwa Raya yang mengendalikan kes ini pada akhir tahun 2019,” katanya dalam satu kenyataan hari ini.\n",
    "\n",
    "Riza pada Khamis dilepas tanpa dibebaskan daripada lima pertuduhan pengubahan wang berjumlah AS$248 juta (RM1.08 bilion).\n",
    "\n",
    "Dalam persetujuan yang dicapai antara pihak Riza dan pendakwaan, beliau dilepas tanpa dibebaskan atas pertuduhan itu dengan syarat memulangkan semula aset dari luar negara dengan nilai anggaran AS$107.3 juta (RM465.3 juta).\n",
    "\n",
    "Ekoran itu, Mahathir antara lain menyuarakan kekhuatirannya berkenaan persetujuan itu dan mempersoalkan jika pihak yang didakwa atas tuduhan mencuri boleh terlepas daripada tindakan jika memulangkan semula apa yang dicurinya.\n",
    "\n",
    "\"Dia curi berbilion-bilion...Dia bagi balik kepada kerajaan. Dia kata kepada kerajaan, 'Nah, duit yang aku curi. Sekarang ini, jangan ambil tindakan terhadap aku.' Kita pun kata, 'Sudah bagi balik duit okey lah',\" katanya.\n",
    "\n",
    "Menjelaskan bahawa beliau tidak mempersoalkan keputusan mahkamah, Mahathir pada masa sama berkata ia menunjukkan undang-undang mungkin perlu dipinda.\n",
    "\n",
    "Mengulas lanjut, Takiyuddin yang juga setiausaha agung PAS berkata\n",
    "kenyataan Mahathir tidak munasabah sebagai bekas perdana menteri.\n",
    "\n",
    "\"Kerajaan berharap Dr Mahathir tidak terus bertindak mengelirukan rakyat dengan mengatakan beliau ‘keliru’.\n",
    "\n",
    "“Kerajaan PN akan terus bertindak mengikut undang-undang dan berpegang kepada prinsip kebebasan badan kehakiman dan proses perundangan yang sah,” katanya.\n",
    "\"\"\"\n",
    "\n",
    "string2 = cleaning(string2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "string3 = \"\"\"\n",
    "The future looks bleak for Gerakan Tanah Air (GTA), as the possibility of building alliances with other political coalitions slowly withers.\n",
    "\n",
    "Experts believe the unofficial pact is in for a dismal showing and may end up being \"buried\" in the 15th General Election (GE15).\n",
    "\n",
    "They said GTA, which aims to contest more than 120 seats, might end up securing only five seats at most despite having a notable figure — former prime minister Tun Dr Mahathir Mohamad — as its chairman.\n",
    "\n",
    "Associate Professor Dr Awang Azman Awang Pawi said when Dr Mahathir headed Parti Pribumi Bersatu Malaysia (Bersatu) in the previous general election, the party secured only 13 seats.\n",
    "\n",
    "\"Dr Mahathir won only 13 when he was with Bersatu. It is likely that he will do much less under GTA.\n",
    "\n",
    "\"GTA's political future will likely be buried sooner than it thinks,\" he told the New Sunday Times yesterday.\n",
    "\n",
    "The political analyst from Universiti Malaya said GTA's plan to contest 120 seats would be an obstacle for the pact, as the political coalition would find it hard to find worthy and acceptable candidates.\n",
    "\n",
    "GTA is made up of Parti Pejuang Tanah Air (Pejuang), Parti Bumiputera Perkasa Malaysia, Barisan Jemaah Islamiah Se-Malaysia and Parti Ikatan India Muslim Nasional.\n",
    "\n",
    "GTA had previously courted Perikatan Nasional (PN) and Pakatan Harapan (PH) in the run-up to the election.\n",
    "\n",
    "Dr Mahathir had earlier last week said he was willing to \"set his ego aside\" to unite the opposition parties to offer a strong challenge against Barisan Nasional (BN).\n",
    "\n",
    "Last Thursday, PN shut its doors to any cooperation with GTA for GE15, saying it would move ahead with its component parties.\n",
    "\n",
    "So far, GTA has announced eight candidates for GE15: Dr Mahathir for the Langkawi parliamentary seat, Datuk Seri Mukhriz Mahathir for Jerlun, Datuk Amiruddin Hamzah for Kubang Pasu, Datin Che Asmah Ibrahim for Sepang, Mior Nor Haidir Suhaimi for Tapah, Harumaini Omar for Hulu Selangor, Mohd Shaid Rosli for Kuala Selangor and Datuk Seri Khairuddin Abu Hassan for Titiwangsa.\n",
    "\n",
    "Another political analyst, Azmi Hassan, predicted that GTA would have to go it alone, and did not see any chance that Pakatan Harapan (PH) would accept the pact into its fold.\n",
    "\n",
    "He said GTA had little to offer to PH.\n",
    "\n",
    "\"They (GTA) have nothing to add to PH, unlike the Malaysian United Democratic Alliance.\n",
    "\n",
    "\"GTA has no political history, and its performance in the Johor election showed how weak Pejuang's influence is.\"\n",
    "\n",
    "He said GTA should concentrate in areas that it could perform well, such as Kedah where Dr Mahathir and Mukhriz would defend their seats.\n",
    "\"\"\"\n",
    "\n",
    "string3 = cleaning(string3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### List available HuggingFace models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'mesolitica/finetune-keyword-t5-small-standard-bahasa-cased': {'Size (MB)': 242,\n",
       "  'f1': 0.3291554473802324,\n",
       "  'Suggested length': 1024},\n",
       " 'mesolitica/finetune-keyword-t5-base-standard-bahasa-cased': {'Size (MB)': 892,\n",
       "  'f1': 0.3367989506031038,\n",
       "  'Suggested length': 1024}}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "malaya.keyword.abstractive.available_huggingface"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load HuggingFace model\n",
    "\n",
    "```python\n",
    "def huggingface(\n",
    "    model: str = 'mesolitica/finetune-keyword-t5-small-standard-bahasa-cased',\n",
    "    force_check: bool = True,\n",
    "    **kwargs,\n",
    "):\n",
    "    \"\"\"\n",
    "    Load HuggingFace model to abstractive keyword.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    model: str, optional (default='mesolitica/finetune-keyword-t5-small-standard-bahasa-cased')\n",
    "        Check available models at `malaya.keyword.abstractive.available_huggingface()`.\n",
    "    force_check: bool, optional (default=True)\n",
    "        Force check model one of malaya model.\n",
    "        Set to False if you have your own huggingface model.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    result: malaya.torch_model.huggingface.Keyword\n",
    "    \"\"\"\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = malaya.keyword.abstractive.huggingface()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Keyword\n",
    "\n",
    "```python\n",
    "def generate(\n",
    "    self,\n",
    "    strings: List[str],\n",
    "    top_keywords: int = 5,\n",
    "    **kwargs,\n",
    "):\n",
    "    \"\"\"\n",
    "    Generate texts from the input.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    strings : List[str]\n",
    "    top_keywords: int, optional (default=5)\n",
    "    **kwargs: vector arguments pass to huggingface `generate` method.\n",
    "        Read more at https://huggingface.co/docs/transformers/main_classes/text_generation\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    result: List[str]\n",
    "    \"\"\"\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "spaces_between_special_tokens is deprecated and will be removed in transformers v5. It was adding spaces between `added_tokens`, not special tokens, and does not exist in our fast implementation. Future tokenizers will handle the decoding process on a per-model rule.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[['letak jawatan',\n",
      "  'PPBM',\n",
      "  'Majlis Pimpinan Tertinggi',\n",
      "  'MPT',\n",
      "  'politik',\n",
      "  'presiden',\n",
      "  'mesyuarat khas',\n",
      "  'Marzuki Yahya',\n",
      "  'JPPM',\n",
      "  'Jawatankuasa Pendaftar Pertubuhan',\n",
      "  'Jawatankuasa Kerja',\n",
      "  'Mukhriz Mahathir',\n",
      "  'Perdana Menteri',\n",
      "  'Muhyiddin Yassin',\n",
      "  'Perikatan Nasional',\n",
      "  'kerajaan',\n",
      "  'Mahathir Mohamad',\n",
      "  'Bersatu'],\n",
      " ['Najib Razak',\n",
      "  'Dr Mahathir Mohamad',\n",
      "  '1MDB',\n",
      "  'mahkamah',\n",
      "  'undang-undang',\n",
      "  'pengubahan wang haram',\n",
      "  'Peguam Negara',\n",
      "  'Takiyuddin Hassan',\n",
      "  'Riza Aziz',\n",
      "  'kerajaan'],\n",
      " ['General Election',\n",
      "  'Awang Azman Awang Pawi',\n",
      "  'GE14Kedah',\n",
      "  'Dr Mahathir Mohamad',\n",
      "  'Perikatan Nasional',\n",
      "  'Pejuang',\n",
      "  'GE15',\n",
      "  'Pakatan Harapan',\n",
      "  'GTA',\n",
      "  'Malaysian United Democratic Alliance',\n",
      "  'Malaysia Votes',\n",
      "  'PH',\n",
      "  'Parti Pribumi Bersatu Malaysia',\n",
      "  'PN']]\n",
      "CPU times: user 14.8 s, sys: 0 ns, total: 14.8 s\n",
      "Wall time: 1.28 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "pprint(model.generate([string, string2, string3], max_length = 256, top_keywords = 20))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Good thing about HuggingFace\n",
    "\n",
    "In `generate` method, you can do greedy, beam, sampling, nucleus decoder and so much more, read it at https://huggingface.co/blog/how-to-generate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[['Melayu',\n",
      "  'Pakatan Harapan',\n",
      "  'jppm',\n",
      "  'letak jawatan',\n",
      "  'PPBM',\n",
      "  'Jawatankuasa Pendaftar Pertubuhan MTUC',\n",
      "  'Parti Pribumi Bersatu Malaysia',\n",
      "  'Marzuki Yahya',\n",
      "  'Mahathir Mohamad',\n",
      "  'politik',\n",
      "  'UMNO',\n",
      "  'Mukhriz Mahathir',\n",
      "  'Muhyiddin Yassin',\n",
      "  'Jabatan Pendaftar Pertubuhan Malaysia',\n",
      "  'MPT',\n",
      "  'Barisan Nasional',\n",
      "  'Mesyuarat Khas Majlis Pimpinan Tertinggi',\n",
      "  'PAS',\n",
      "  'Perikatan Nasional',\n",
      "  'Bersatu'],\n",
      " ['tuduhan curi',\n",
      "  'Dr Mahathir Mohamad',\n",
      "  '1MDB',\n",
      "  'lepas tanpa bebas',\n",
      "  'pelepasan tanpa bebas',\n",
      "  'Peguam Negara',\n",
      "  'perlembagaan',\n",
      "  'ketidaksamaan',\n",
      "  'tempatan',\n",
      "  'persetujuan',\n",
      "  'Najib Razak',\n",
      "  'DNAA',\n",
      "  'pengubahan wang haram',\n",
      "  'Riza Aziz',\n",
      "  'kebebasan',\n",
      "  'Umno',\n",
      "  'undang-undang',\n",
      "  'top bm',\n",
      "  'Takiyuddin Hassan'],\n",
      " ['Mahathir-Anwar',\n",
      "  'GE15',\n",
      "  'Pakatan Harapan',\n",
      "  'GTA',\n",
      "  'pilihan raya umum',\n",
      "  'PH',\n",
      "  'Parti Pribumi Bersatu Malaysia',\n",
      "  'Mahathir Mohamad',\n",
      "  'General Election',\n",
      "  'Awang Azman Awang Pawi',\n",
      "  'Mukhriz Mahathir',\n",
      "  'Azmin Ali',\n",
      "  'Barisan Nasional',\n",
      "  'Khairuddin Abu Hassan',\n",
      "  'Amiruddin Hamzah',\n",
      "  'Malaysia Memilih',\n",
      "  'Perikatan Nasional',\n",
      "  'Bersatu',\n",
      "  'PRU-15']]\n",
      "CPU times: user 49.6 s, sys: 635 ms, total: 50.2 s\n",
      "Wall time: 4.24 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "pprint(model.generate([string, string2, string3], max_length = 256, top_keywords = 20,\n",
    "                     penalty_alpha=0.6, top_k=4))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
